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Abstract 

Tries are among the most versatile and widely used data structures on words. In particular, 
they are used in fundamental sorting algorithms such as radix sort which we study in this 
paper. While the performance of radix sort and tries under a realistic probabilistic model for 
the generation of words is of significant importance, its analysis, even for simplest memoryless 
sources, has proved difficult. In this paper we consider a more realistic model where words 
are generated by a Markov source. By a novel use of the contraction method combined with 
moment transfer techniques we prove a central limit theorem for the complexity of radix sort 
and for the external path length in a trie. This is the first application of the contraction 
method to the analysis of algorithms and data structures with Markovian inputs; it relies on 
the use of systems of stochastic recurrences combined with a product version of the Zolotarev 
metric. 


1 Introduction 

Tries are prototype data structures useful for many indexing and retrieval purposes. Tries were 
first proposed by de-la-Briandais in 1959 [1] for information processing. Fredkin in 1960 suggested 
the current name, part of the word retrieval [2211251136] . They are pertinent to (internal) structure 
of (stored) words and several splitting procedures used in diverse contexts ranging from document 
taxonomy to IP addresses lookup, from data compression to dynamic hashing, from partial-match 
queries to speech recognition, from leader election algorithms to distributed hashing tables and 
graph compression. 

Tries are trees whose nodes are vectors of characters or digits; they are a natural choice of data 
structure when the input records involve the notion of alphabets or digits. Given a sequence of 
n binary strings, we construct a trie as follows. If n = 0 then the trie is empty. If n = 1 then a 
single external node holding the word is allocated. If n > 1 then the trie consists of a root (i.e., 
internal) node directing strings to two subtrees according to the first symbol of each string, and 
strings directed to the same subtree recursively generate a trie among themselves, see Figure [T] 
and Section for a more formal definition. The internal nodes in tries are branching nodes, used 
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(a) Radix Sort 



Figure 1: Radix sort and a trie applied to the strings: Si = 1101..., S2 = 0001..., S3 = 
0110 ..., S4 = 0000 ... , S5 = 1111..., Se = 1110 ... Note that radix sort places Si into three 
sublists, also called buckets, (and has to read the first three symbols of Si) whereas the node 
storing Si has depth three in the corresponding trie. 


merely to direct records to each subtrie; the record strings are all stored in external nodes, which 
are leaves of such tries. 

Tries can be used in many fundamental algorithms, in particular for sorting known as radix 
sort or more precisely most significant digit radix sort |22| . In this cases, the n strings are binary 
representations of keys to be sorted. They are inserted in a trie as described above. A so-called 
depth-first traversal of the trie starting at the root node will visit each key in sorted order. In 
other words, keys that start with a 0 are moved to the left subtree also called a left bucket, while 
the other keys are stored in the right subtree or right bucket. In the sequel, we sort keys in the 
left and the right buckets using the second symbol, an so on as shown in Figure (Tfa). A recursive 
description of the radix sort algorithm is presented in Section In this paper, we shall use the 
trie and radix sort paradigms exchangeably. The complexity of such radix sort is equal to the 
external path length of the associated tries, that is, the sum of the lengths of the paths from the 
root to all external nodes. 

We study the limit law of the radix sort complexity and the external path length of a trie built 
over n binary strings generated by a Markov source. More precisely, we assume that the input is 
a sequence of n independent and identically distributed random strings, each being composed of 
an infinite sequence of symbols such that the next symbol depends on the previous one and this 
dependence is governed by a given transition matrix (i.e., Markov model). 

Digital trees, in particular, tries have been intensively studied for the last thirty years [310II1 
miiaiioiiiiTiiiaiisiise], mostly under Bernoulli (memoryless) model assumption. The typical 
depth under the Markov model was analyzed in m, however, not the external path length. The 
external path length is more challenging due to stronger dependency, see [36]. In fact, this is 
already observed for tries under the Bernoulli model [36] . In this paper we establish a central 
limit theorem for the external path length in a trie built over a Markov model using a novel use 
of the contraction method. 

The contraction method was introduced in 1991 by Uwe Rosier m for the distributional 
analysis of the complexity of the Quicksort algorithm. It was then developed independently by 
Rosier and by Rachev and Riischendorf [30] in the early 1990’s. Over the last 20 years this 
approach, which is based on exploiting an underlying contracting map on a space of probability 
distributions, has been developed as a fairly universal tool for the analysis of recursive algorithms 
and data structures. Here, randomness may come from a stochastic model for the input or from 
randomization within the algorithms itself (randomized algorithms). General developments of this 
method were presented in [32l[30l|33|[27l|28l|8|[l9l|29] with numerous applications in computer 
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science, information theory, and networking. 

The contraction method has been used in the analysis of tries and other data structures only 
under the symmetric Bernoulli model (unbiased memoryless source) |27l Section 5.3.2], where 
limit laws for the size and the external path length of tries were re-derived. The application of 
the method there was heavily based on the fact that precise expansions of the expectations were 
available, in particular smoothness properties of periodic functions appearing in the linear terms 
as well as bounds on error terms which were 0(1) for the size and O(logn) for the path lengths. 
It should be observed that even in the asymmetric Bernoulli model such error terms seem to be 
out of reach for classical analytic methods; see the discussion in Flajolet, Roux, and Vallee [3]. 
Hence, for the more general Markov source model considered in the present paper we develop a 
novel use of the contraction method. 

Furthermore, the contraction method applied to Markov sources hits another snag, namely, 
the Markov model is not preserved when decomposing the trie at its root into its left and right 
subtree. The initial distribution of the Markov source is changed when looking at these subtrees. 
To overcome these problems a couple of new ideas are used for setting up the contraction method: 
First of all, we will use a system of distributional recursive equations, one for each subtree. We 
then apply the contraction method to this system of recurrences capturing the subtree processes 
and prove asymptotic normality for the path lengths conditioned on the initial distribution. In 
fact, our approach avoids dealing with multivariate recurrences and instead we reduce the whole 
analysis to a system of one-dimensional equations. To come up with an appropriate contracting 
map we use a product version of the Zolotarev metric. 

We also need asymptotic expansions of the mean and the variance for applying the contraction 
method. In contrast to very precise information on periodicities of linear terms for the symmetric 
Bernoulli model mentioned above and in view of the results in [3] mentioned above we cannot 
expect to obtain similarly precise expansions. In fact, our convergence proof does only require 
the leading order term together with a Lipschitz continuity property for the error term. The lack 
of a precise expansion is compensated by this Lipschitz continuity combined with a self-centering 
argument to obtain sufficiently tight control on error terms. 

For the derivation of such an expansions of the mean (and the variance) we use moment transfer 
theorems. Such theorems were largely developed by H.-K. Hwang, see, e.g., [nnniiiiiii], for the 
control of moments related to one-dimensional recurrences. We extend such theorems to systems 
of recurrences as they occur for the analysis of our Markov model. For the expansion of the 
variance we also make use of a construction due to Schachinger [35]. 

This is the first application of the contraction method to the analysis of algorithms and data 
structures with Markovian inputs. Our results were announced in the extended abstract [24]. The 
methodology developed is general enough to cover related quantities and structures as well. Our 
approach also applies with minor adjustments at least to the path lengths of digital search trees 
and PATRICIA tries under the Markov source model, see the dissertation of the first mentioned 
author [23] . 

The Markov source model is more realistic and more flexible than the (memoryless) Bernoulli 
model. Even more general models have been analyzed in the context of tries. Vallee m intro¬ 
duced the dynamical source models which, in particular, cover the Markov model. The analysis 
of dynamical sources for tries started with the work of Clement, Flajolet and Vallee in 0 , in¬ 
cluding the asymptotic of the expectation of several trie parameters such as height, size and the 
depth/external path length. There is a limit theorem for the depth in tries for special (so-called 
tame) dynamical sources, see [3], and a limit theorem for the depth in the (closely related) digital 
search tree for two types of general sources, see m- However, a limit theorem for the external 
path length in tries and the complexity of radix sort has not yet been derived for dynamical sources. 

Notations: Throughout this paper we use the Bachmann-Landau symbols, in particular the big 
O notation. We declare a; log a: := 0 for a: = 0. By B{n,p) with n G N and p G [0,1] the binomial 
distribution is denoted, by B{p) the Bernoulli distribution with success probability p, by Af{0, cP') 
the centered normal distribution with variance cr^ > 0. We use C as a generic constant that may 
change from one occurrence to another. 
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2 Main Results 


In this section we first describe succinctly the radix sort and his relation to tries. Then we present 
our probabilistic model, and the main result of this paper. 


Radix sort. Given n keys represented by binary strings, we can sort them in the following way. 
We first split them according to the first bit: those string starting with a 0 go to the left bucket, 
while the others to the right bucket. In each bucket we sort remaining strings in the same manner 
using the second bit. And so on. At the end we read all keys from left to right and all n keys are 
sorted, see Figured] This is called a radix sort [^. The number of inspected bits needed to sort 
such n keys (strings) is denoted by and called it in short the number of bucket operations. It 
measures the complexity of radix sort. We study its limiting distribution in this paper. 

It is easy to see that we can achieve the same result by building a trie from n strings and visit 
all external nodes in a tree traversal. Then can be interpreted as the length of the external 
path length, that is, the sum of all paths from the root to all external nodes. 


The Markov source: We now define the probabilistic model for string generation. We shall 
assume that binary data strings over the alphabet E = {0,1} are generated by a homogeneous 
Markov source. In general, a homogeneous Markov chain is given by its initial distribution /r = 
Mo^o+Mi^i on S and the transition matrix {pij)ije'S 2 - Here, Sx denotes the Dirac measure in a; G R. 
Hence, the initial state is 0 with probability /tq and I with probability pi. We have G [0,1] 

and fj.Q + fj,i = 1. A transition from state i to j happens with probability pij, i,j € S. Now, 
a data string is generated as the sequence of states visited by the Markov chain. In the Markov 
source model assumed subsequently all data strings are independent and identically distributed 
according to the given Markov chain. 

We always assume that pij > 0 for all i,j G E. Hence, the Markov chain is ergodic and has a 
stationary distribution, denoted by tt = tto6o + We have 




Pw 

Poi +Pio’ 


Poi 

TTl = -^-. 

Poi +Pw 


( 1 ) 


Note however, that our Markov source model does not require the Markov chain to start in its 
stationary distribution. 

The case pij = 1/2 for all i,jGT, is essentially the symmetric Bernoulli model (only the first 
bit may have a different initial distribution). The symmetric Bernoulli model has already been 
studied thoroughly also with respect to the external path length of tries; see [iniiiTj. Hence, we 
exclude this case subsequently. For later reference, we summarize our conditions as: 

Pij S (0,1) for all i,j S E, pij ^ i for some (i,/) G E^. (2) 


The entropy rate of the Markov chain plays an important role in the asymptotic behavior 
of the performance of radix sort. In particular, it determines the leading order constant of the 
average number of bucket operations (path length) performed by radix sort. The entropy rate for 
our Markov chain is given by 


H ■.= - 


TTi Pij \ogpij = TTiHi, 


(3) 


where Hi := — logp^ is the entropy of a transition from state i to the next state. Thus, 

H is obtained as weighted average of the entropies of all possible transitions with weights accord¬ 
ing to the stationary distribution tt. 


Our main result concerning the distribution of the number of bucket operations in radix sort 
or the path length in a trie is presented next. We will write B!^ for Bn to make its dependence on 
the initial distribution explicit. 
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Theorem 2.1. The number B!^ of bucket operations under the Markov source model with condi¬ 
tions m satisfies, as n ^ oo, 


IE [BIf] = ^nlogn + 0(n), Var {B^) = cr^nlogn + O (^\/\ogn'j 
where the entropy rate H is defined in and is given by 


2 T^oPooPoi 
a = 


i/3 

Moreover, as n ^ oo. 


log(poo/Poi) 


Hi-Ho 
POl + PlQ 


+ 


T^lPWPll 

i/3 


log(pio/pii) 


Hi-Ho 
Poi + Pio 


Bii-E[Bii] , 




v'Var(S^) 

where A/'(0,1) denotes a random variable with the standard normal distribution. 


The analysis of B^ is based on a system of recursive distributional equations discussed in the 
next section. Section 3] contains some moment-transfer theorems that are used in the analysis of 
mean and variance. These theorems are applied to the analysis of the mean in section[S]in order to 
derive the asymptotic expansion in Theorem 12.II as well as a more detailed study of the remaining 
term f^{n) := ]E[i3^] — n\ogn/H which is necessary to obtain the limit law in section [T] 

The first order asymptotic of YaT{B!f) with uniform error term is derived in section [51 It is 
based on the moment-transfer theorems from section 3] but requires some additional ideas such as 
a splitting of Btf into a suitable sum and a poissonization argument. 

Finally, the limit theorem is establish in section 0 The proof is based on the contraction 
method. In fact, the asymptotic analysis of the moments enables us to apply this technique. It 
is possible to obtain a more detailed asymptotic expansion of the mean by analytical techniques 
however, without the analysis of the increments in proposition 15.21 the analysis in section [7] would 
require an asymptotic expansion up to the order of o{y/nlogn). It should be pointed out that 
analytic techniques allows asymptotics of the mean and the variance up to o{n) [36) . 


3 Recursive Distributional Equations 

We formulate in this section a system of distributional recurrences to capture the distribution of 
the number of bucket operations. Our subsequent analysis is entirely based on these equations. 
In the sequel, we phrase our discussion in terms of the radix sort algorithm. 

We denote by B^ the number of bucket operations (i.e., number of bits inspected by radix 
sort) performed sorting n data under the Markov source model with initial distribution p, using 
the radix sorting algorithm. We have Bjf = Bff = 0 for all initial distributions /i. The transition 
matrix is given in advance and suppressed in the notation. We abbreviate B), := Bff for i S E 
and Pi = Piodo -f Pudi- We will study and Bf. From the asymptotic behavior of these 
two sequences we can then directly obtain corresponding results for Btf for an arbitrary initial 
distribution p, = podo + Pidi as follows: We denote by Kn the number of data among our n that 
start with bit 0. Then Kn has the binomial B(n, po) distribution. In the Markov source model the 
distribution of the second bit of every data string that starts with bit 0 is po- In particular, for any 
data string 5 = ^ 1^2 . • ■ in the left bucket (i.e. = 0) the remaining suffix ^ 2^2 ... is generated by 

a Markov source model with initial distribution po and the same transition matrix as the original 
source. Similarly, the remaining suffixes in the right bucket are generated by a Markov source 
model with initial distribution pi and the same transition matrix. Moreover, by the independence 
of data strings within the Markov source model, the number of bucket operations in the left bucket 
and the number of bucket operations in the right bucket are independent conditionally on 
This leads to the following stochastic recurrence: 

+ Bn_j^^ -\-n, n>2, (4) 
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where (Bq, ..., B^), (Bg,..., i?^) and are independent and = denotes that left and right hand 
side have identical distributions. We will see later that we can directly transfer asymptotic results 
for and B^ to general via Q, see, e.g., the proof of Theorem 17.II 
In particular, o implies for fi = pg that 


B° ^ Bl + Bl_j^ + n, n>2, (5) 

with (Bg,..., i?°), (Bg,..., B^) and independent binomially B{n,poo) distributed. A similar 
argument yields a recurrence for B^. Denoting by a binomially B{n,pig) distributed random 
variable, we have 

Bl,^B°j^+Bl_j^+n, n>2, (6) 

with (Bg ,..., i?°), (Bg,..., B^) and Jn independent. Our asymptotic analysis of B(( is based on 
the distributional recurrence system dS])-® as well as Q. 

For further references, we abbreviate and ® by 


K = 




n > 2, i G Yi, 


(7) 


with (Bg,..., B((), (Bg,..., i?^) and independent, I* binomial B(n,pig) distributed. 


4 Transfer Theorems for Mean and Variance 


Throughout this section, let (ai(n))„gNp and be real valued sequences for i G {0,1}. 

Furthermore, let follow the binomial distribution B(n,pio) for i G {0,1}. Suppose that these 
sequences either satisfy 

ai(n) =E[ao(Il,)]+E[ai(n-P„)]+ei(n), i G {0,1}, n G N, (8) 

which is the case for, e.g., ai(n) = ]E[i3^] and ei(n) = nl[ 2 ,oo)(n), or satisfy 

ai(n) =pioE[ao(In)]+PiiE[ai(n-P„)]+ei(n), i G {0,1}, n G N, (9) 


which is the case for, e.g., ai(n) = fi(n + l) — fi(n) where fi(n) = E[Bl] — -Anlogn and £i(n) = 1. 
Upper bounds on Si(n) may be transferred to bounds on ai(n) by the following lemma: 


Lemma 4.1. Assume that (|S]) holds, 
implies, as n ^ oo, 


ai(n) = 


Then, £i(n) = 0(n“) for an a G K. and both i G {0,1} 

( 0(n), ifa<l, 

< 0(n“), ifa>l, 

[ O(nlogn), if a = 1. 


More precisely, the first order asymptotic of linear £i(n) terms yield the following first order 
asymptotic of ai(n): 

Lemma 4.2. Assume that ([5]) holds. Then, £i(n) = CiU + 0(n“) for co,ci G K. and a < 1 and 
both i G {0,1} implies that, as n ^ oo, 


ai(n) 


TTgCo + TTiCi 

H 


n 


logn + 0(n) 


with constants 7ro,7ri and H given in and @. 

Similarly, there are the following results on transfers for 
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Lemma 4.3. Assume that © holds. Then, Si{n) = 0(n“) for a?7, a G K. and both i G {0,1} 
implies that, as n ^ oo, 

( 0(1) i/a < 0, 

az(n) = < 0(n“) i/a>0, 

[ O(logn) if a = 0. 

Lemma 4.4. Assume that holds. Then, £i{n) = Cj + 0(n “) for q G R, a > 0 and both 
i G {0,1} implies, as n ^ oo, 

= -FT-logn + 0(l), J G E, 

£1 

with constants ttq , tti and H given in (HD and 

Proof of lemma \4-l\ The proof relies on the fact that 7° and if are concentrated around their 
means poon and pion. This leads to a geometric decay in the size of the toll term when iterating 
(|5|) on the right hand side. It is more convenient to work with the monotone sequences given by 

C'i(n) := sup{|ai(fc)| : 0 < 7 < nj, (^(n) := max{C'o(n), ( 71 ( 71 )}, iGE,nGNo. 

Due to the upper bound |ai(n)| < (7(n) for both i G (0,1}, an upper bound on (7(n) is sufficient 
to prove the assertion. To this end, let max^ jg^optiPy} < ^ < 1 be a constant (the exact value 
of 6 does not matter) and decompose ([5]) into 

|a.(n)| <E[(C(/;) + C(rz-j;))l{,jg[(i_^)„,,„]}]+C(rz)P(/;^ [(1-5)77, fc]) + |e,(77)|. (10) 

Note that at least one of the following three equalities needs to hold by definition: 


C{n) = \ao{n)\ or C{n) = \ai{n)\ or C(n) = C(n — l). 

Thus, the assumption on £i(n) implies that there exists a constant L > 0 such that at least one 
of the following two bounds holds 


f3{n)C{n) < max {E[(C(/;) + ^(77 - /;))!{,. 
Cin) < (7(77-1), 


,(5n 


]}] + E?7“} 


( 11 ) 


where fi{n) := 1 — 2maxigs{P(J* ^ [(1 — S)n,6n])} converges to 1 by a Chernoff bound on the 
binomial distribution (or the central limit theorem). Now (lll|) implies for any e > 0 by induction 
on 77 that 

C(77) < D77'"“i-(^’^“}(1 + e)’" 

where D = D(e) > 0 is a sufficiently large constant. This yields for any K > 1 the rough upper 
bound (7(n) = 0(77’^). 


To refine this bound, note that a standard Chernoff bound on the binomial distribution implies 
the existence of a constant c > 0 such that for all 77 > 0 

1/3(77) - 1| < 4e-™ 

which together with C{n) = 0(77") for 1 < 77 < e'^ yields a constant L' > 0 such that 

1 / 3 ( 77 ) - l\C{n) < L'n°^, 77 G N. 

Combined with (HUD, this bound implies by induction on n that 

[- log n/ log (5J 

(7(77) <Ln ( 51 -“)^' 

7=0 

where L = max {(7((7 + 1), {L + L') max{5““^, 1}}. Thus, the assertion holds by the asymptotic 
of the geometric sum. □ 
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n e N, i e {0,1} 


Proof of lemma\4. 

ai{n) : 

satisfy 


An easy calculation reveals that the sequences 

, N TToCo + TTlCl Cl-iHi 

ai[n) - - -nlogn + --^- —n, 

H [vw+Pm)H 


ai[n) = 


) = E[ho(/;)] + E[hi(n - rj] + O 


Thus, lemma l4Tl yields ai{n) — 0(n) and the assertion follows. More precisely, note that the 
transformed sequences satisfy for all n G N and i £ {0,1} 

a^{n) =E[ao(/^)] +E[ai(n-/*)] + ei{n) 


with, for h{x) := xlogx, 

£i{n) = £i{n) - c {h{n) - E[/i(/*) + h{n - P„)]) 

Cl-iHi ciHq cqHi 

{pw+Poi)H {pio+poi)H {pio+Poi)H 

Thus, it only remains to show £i{n) = O To this end, note that 

h{n)-E[h{rj + h{n-Il)] 

= -E[n/i(/;/n) + nh{l - rjn)\ 

= HiU - nE[h{r^/n) - h{pio) + h{l - I\/n) - h(pii)] 

= HiU + O 

where the last equality holds by the concentration of the binomial distribution and the asymptotic 
of log(l + a;) as a: —>■ 0 (note that log(/*/n) — log(pio) = log(l + (/^ —npio)/(^Pio)))- Details can be 
found in the appendix, equation (1601) . Therefore, an easy calculation yields ii (n) = O (7^max{a.i/3}^ 
and the assertion follows. □ 


Proof of lemma The idea is essentially the same as in the proof of lemma 14.11 Once again, it 
is more convenient to work with the monotone sequences (Ci(n))„>o and (C(n))n>o given by 

C'i(n) := sup{|ai(A:)| : 0 < fc < n}, (^(n) := max{C'o(n), Ci (n)}, nGNo,iGE. 

With maxj } < 5 < 1 equation © may be decomposed into 

|o*(n)| < E[(pmCo(I^) +p^lCl(n - In))l{i^e[(i-S)n,Sn]}l + i [(1 - s)n,6n]) + |ei(n)| 

As in the proof of 14.11 this implies C{n) = 0(A'") for any constant AT > 1 and, by a standard 
Chernoff bound on the binomial distribution 

|ai(n)| < E[{pioCo{PJ + p^iCiin - /A))l{4G[(i-5)n.5n]}] + 

One obtains by induction on n that 

[- logra/ log 

C'(n) <L Y, 

k=0 

and the assertion follows by the asymptotic behavior of the geometric sum. □ 

Proof of lemma E3 An easy calculation reveals that the sequences 

c H 

di{n) := ai{n) - Lg{n) + - - \ , i G {0,1}, n G N 

(Poi +Pio)H 

with L = (ttqCo + 7riCi)/i? satisfy 

di{'n) =pioE[ao(/,^)] +PiiE[ai(n -/^)] + 0 . 

Thus, lemma 1131 implies the assertion. □ 








5 Analysis of the Mean 

First we study the asymptotic behavior of the expected number of Bucket operations with a precise 
error term needed to derive a limit law in Section 0 

Theorem 5.1. For the number B!^ of Bucket operations under the Markov source model with 
conditions m we have 


E[B^] = ^n\ogn + 0{n), (n^oo), 

with the entropy rate FI of the Markov chain given in m- The 0(n) error term is uniform in the 
initial distribution p,. 

Our proof of Theorem 15.11 as well as the corresponding limit law in Theorem 17.11 depend 
on refined properties of the 0(n) error term that are first obtained for the initial distributions 
Po = Poo'^o + Poi^i and pi = pioSo + Piidi and then generalized to arbitrary initial distribution 
via 0. For those initial distributions we denote the error term for all n S No and i S E by 

Mn) := - ^nlogn. (12) 

The following Lipschitz continuity of /o and /i is crucial for our further analysis: 

Proposition 5.2. There exists a constant C > 0 such that for both i G Tj and all m,n S Nq 

IMm) - fi(n)\ < C\m-n\. 

In order to prove the Lipschitz continuity of the error terms /o and /i lproDOsition l5.2l) we will 
analyze the increments of {fo{n))n>o and (/i(n))„>o and apply Lemma l4^ We use the following 
notation for the increments: 

For a sequence x = (x(n))n>o in R we denote its (finite forward) difference sequence by 
(Ax(n))n>o, where 


Ax(n) := {Ax){n) := x{n + 1) — x{n), n G N. 

Note that the order of operation is first applying the A-operator to the sequence then evaluating 
the difference sequence at n. In particular, for any sequence {mn)nen in No we have 

Ax{mn) = x(mn + 1) — x{mn), n G No 

(and in general Ax{mn) ^ x{mn+i) — a;(m„)). 

In the analysis of {Afi{n))n>ot i G S we use the following Lemma which is a special case of 
Lemma 2 in Schachinger [55] . 

Lemma 5.3. For any real sequence (a(n))n>o o-ixd binomially B(ri,p) distributed withp G (0,1) 
we have 


AE[o(A„)] = pE[Aa{Xr^)], n G N. 

Proof. Note that Xn+i = Xn + B in which B and A„ are independent and P(B = 1) = p = 
1 — E{B = 0). This yields 

AE[a(A„)] = E[a(X„ + B) - a(A„)] = pE[Aa(A„)] 
which is the assertion. □ 
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Proof of proposition Note that ([7]) implies 

Mn) = E[/o(/;)] + E[/i(n - /;)] + e,(n) 

with the toll function 

Eiin) = n - ^(nlogn-E[/;iog/;] -E[(n-/;)log(n-/;)]. 
Thus, lemma [5?3l yields for the increments ai(n) := Afi{n) 

aiiji) = pjoE[ao(/*)] + PiiE[ai(n - /*)] + Aei(n). 


Moreover, another application of lemma [S31 yields 

Ae,(n) = 1 - ^(A/i(n) - p,oE[A/i(/;)] - KiE[A/i(n - J^)]) 

where h{x) := x\ogx. Since Ah{n) — log(n + 1) + nlog(l + 1/n) = log(n + !) + ! + 0(l/n), one 
obtains 

Aei{n) = 1 - ^(log(n + 1) - p,oE[log(/; + 1)] - paE[log(n - /* + 1)] + 0(l/n) 

= 1 - ^{-Pio logPzo - Pzi logKi) + 

The last equation is based on the fact that E[log((/® + l)/(n + 1))] = log(pio) + 0(n“^/^) for any 
binomially B{n,pio) distributed (details are given in the appendix, equation (1581) 1. Therefore, 
lemma lT4l implies Afi(n) = Llogn + 0(1) with a constant 


L = 


H 


^0 ( 1 - ^(-Poo log Poo - Poi logpol) ) + TTi ( 1 - ^(-Pio logpio - Pll logpil) ) ) = 0. 


H' 


H' 


Thus, Afi{n) is bounded and the assertion follows. 


□ 


Proof of theorem \5.1[ For /r = pioSo + puSi, i € {0,1} theorem 15.11 is an immediate consequence 
of proposition 15.21 For the general case let r'i(n) := E[i?^]. Then, the distributional recursion H] 
yields 


EK] = E[z/o(K„)] + E[i/i(n - KJ] + n. 

Thus, ^'i(n) = nlogn/H + 0(n) implies 

It?- 

^B!f] = —n log n + —E[h{r^/n)] + E[h(l - P„/n)] + 0(n) 
where h{x) = xlogx. Since h is uniformly bounded on (0,1], the assertion follows. □ 


6 Analysis of the Variance 


In this section we establish precise growth of the variance with a uniform bound. We prove the 
following theorem. 


Theorem 6.1. For the number B!f of Bucket operations under the Markov source model with 
conditions m we have, as n ^ oo, 


Var(i?^) = CT^nlogn + O ^n-\/lognj , 


where > 0 is independent of the initial distribution pt and given by 

2 


2 '^oPooPoi / , / / 

CT = -- I Iog(poo/Pou 


iP3 


-H, 


Poi +PlO 


, TTipioPii / , , Hi- Ho 

+ -775- log(pio/Pll + -^- 

\ Poi + Pio 


(13) 


(14) 
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In order to derive the first order asymptotics of the variance without studying the mean in 
detail, we extend an idea of Schachinger in [35] to Markov Sources. The main ingredient is to split 
the number of Bucket operations into a sum of two random variables in which mean and variance 
of the first random variable is easy to derive and the variance of the second random variable is 
small (i.e. 0(n)). 

Once again, for i G E and n G Nq let be a Binomial i?(n,pio) distributed random vari¬ 
able. Now let (X°, and I^)netio be independent sequences of random 

variables with finite second moments that satisfy the initial conditions 

X; = Z; = 0, zGE,n<l 


and, for all n > 2 and f G S 


[z°. 






(15) 


where the toll terms are given by = 0 for n < 1 and 


V'/ ■= ^og{n) - E [j; log (/;) +{n-rj log (n -/;)]) 


Hi-^ - m 

TTi-i-- n - 


H 


^0 n—l ^ r) 

jP^oPil n, n>2, 


(Poi +Pio)-f^ 




(16) 


Since we have = n, note that the sum := + satisfies the same initial conditions 

and the same stochastic recurrence as i.e. equation d?]) and = 0 = for n < 1. In 
particular, this implies that and have the same mean and variance. A discussion on the 
existence of a splitting satisfying (|T5|) and the equality of the moments of S'^ and is given in 
section o 


Remark. The choice of is motivated as follows: Since should be small (E[Z(j] = 0(n), 
Var(Z^) = 0(n)), X^ should satisfy E[A(j] ~ -^nlog(n) which is the reason for the choice of the 
first summand in di). The linear term is chosen to obtain pl£ ~ n and therefore ph'^ = o{n) 
which implies a small variance for Z£ The last summand is chosen for some technical reasons to 
compensate the second one in the calculation of E[A,^]. 

The proof of theorem 16.11 works as follows: first we study the asymptotics of Var(A^) and 
Var(Z^) and then deduce the asymptotics of Yai{Bl^) by the following Lemma: 

Lemma 6.2. For any random variables X^Y with finite second moments we have 

(\/Var(X) - v'Var(r))^ < Var(X T) < (\/Var(X) -h \/Var(r))^ (17) 

In particular, if sequences (A„)„>o, (E„)„>o with finite second moments satisfy X&rfYn) = o(Var(A„)) 
then we have 

Var(A„ + y„) = Var(A„) + O (^/Var(A„)Var(y„)) . (18) 

Proof. By the Cauchy-Schwarz inequality we have 

|Cov(A,y)| < \/Var(A)\/Var(r) 

which together with Var(A + Y)= Var(A) -|- Var(F) -|- 2Cov(A, Y) implies (1171) . Moreover, (I17p 
obviously implies (dll). □ 
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The analysis of Var(X^) is done with lemmaThis requires a detailed asymptotic expansion 
of ]E[X^]. The choice of 77 ^^ leads to the following representation of the mean: 

Lemma 6.3. Let be as in Then we have for all n G No 

= ^nlogn+ = ^nlogn. (19) 

Proof. Let Vx : No —>■ R be given by Vx{n) ■= IE[X*], i G {0,1}- Note that is uniquely 
determined by its initial conditions i^x (^) = 0 for n < 1 and the recursion 

^xi'iT') = ^V'xiln)] + (n — In)] + rj]i^, i G S, n > 2, 


which arises from the recursion dTSl) . Thus, it only remains to check that the choice given in (I19|) 
satisfies these conditions which is an easy calculation. Details are left to the reader. □ 

These expressions and lemma IT^ lead to the following asymptotics of Var(X*): 

Lemma 6.4. ITe have for both i G E as n —> 00 

Var (X*) = cr^n logn + 0(n) 

where is given by m- 

Proof. Let Vx{n) := Var(X^) and ^xin) := E[X^] as in the previous proof. Then, the recursion 
(fTSl) and the independence therein imply 

Vk{n) = E[V^(/;)] + E[Di(n - /;)] + Var(4(/;) + v],{n- /;)). (20) 

It suffices to derive the first order asymptotic of Var(r'^ {In) + 4 {n — In)) to apply lemma IT^ 
To this end, note that by lemma [6)3] with the notation h{x) := xloga; 


Var(4(j;) + 4(n 


/;)) = Var 


f h{Pn)+h{n-Pn) 


Hi-Ho 
{Poi +Pio)H 



( 21 ) 


where Rn = — thus, Var(i?)j) = o(l). Subtracting nlogn in the variance on 

the right hand side of (ED yields 


Var(i/^(/;) + T^]^{n - If)) = Var ( —{If logpio + {n - If) logpn) + 


Hi-Ho 
{pqi +Pio)H 


II + K 


where Rf = Rf + ^{If{log{If/n) - \ogpio) + {n- If){\og{l - If/n) - \og{p^i))). It is not hard to 
check that Vai{Rf) = O(logn), as formally proved below. Therefore, combined with lemma [6)2] 
and Var(J*) = piopun 


VaT{i^^{If) + vx{n - If)) = (^logp*o - logpa + 

Hence, the assertion follows by (PD|l and lemma 


2 

Pi0Piin + O{n‘^^^). 


To complete the proof we now establish that Yai{Rf) = O(logn). Note that the function 

< 7 ^: [0,1] R, x{\ogx - logPio) + (1 - a^)(log(l - a;) - log(l - p^o)) 

is bounded and that the derivative is given by (f'ix) = \og{x/pio) — log((l — a:)/(l — Pio)). In 
particular, there exists a constant C > 0 such that for all sufficiently large n 

|0'(x)| < C 


logn 


X G 


Pio - \/{\ogn)/n , pio + y (logn)/n 
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One obtains 


by the boundedness of (j) and a standard Chernoff bound and, by the previous observations, the 
mean value theorem and a self centering argument (let be an independent copy of /^) 


^n/'^')^{I^—npio\<^n log n} 4^(.^n/‘^')'^{\J^—npio\<^n log n}^ 




CMogn 
2 n 


E 


{Ijn- 



+ O (n-2) = O 



The bound on Var(i?(j) follows by lemma l 6 ^ since + n(()(/^/n) and Var(i?(j) = o(l). □ 

In order to derive the asymptotics of Var(Z^) we start with an upper bound on 77 ^’^: 

Lemma 6.5. For r]]i^ defined in (Q3( we have for both i G E, as n —>■ 00 

I'n = O C^ogn). 

Proof. By the definition of 77 ( 7 ^ in (jl 6 p one only needs to compute the asymptotic of 

h{n)-E[h{rj]-E[h{n-rj], h{n) := n\ogn. 

Since h{n) = E[If logTr] + E[( 7 r — If) logn], one obtains 

h{n) - E[h{If)] - E[h{n - If)] = -n{E[h{If/n)] + E[h(l - If/n)]) = nH, - nEWf/n)] 

where Hi = -piologpio-pulogpn and (j){x) = x{\ogx-logpio) + {l-x){\og{l-x)-\og{l-pio). 
With the same arguments as at the end of the previous proof one obtains |(/>(a;)| = 0 ((log 7 r)/n) 
uniformly for x G [pio — \/(log 7 r)/ 7 r, pio + •\/(log n)/n which implies by a standard Chernoff bound 
on the binomial distribution that nE[(j){If/n)] = 0{logn). Hence, = n + O(logn) since 
H = ttoHq + TTiiJi and the assertion follows since 77 ( 7 ^ = n — rjf^. □ 

Note that we have the following Lipschitz-continuity of the means: 

Lemma 6.6. For j G E /et : Nq —)> M fee given by 

4 (n) = E[Z;], 

where {Zf)netio,ie's satisfies U5\} . Then, the functions and v\ are Lipschitz continuous, 
i.e. there exists a constant C > 0 such that for 7 G E and tt, tti G No we have 

Wzi'^T') ~ ^ C'Itt. — m\. 


Proof. Since we have 


E[X; + Zf] = E[Bf] 


the assertion immediately follows from proposition 15.21 and lemma 16.31 


□ 


The next step is to show that Var(Zi) = 0{n) which we present in lemma 1631 below. However, 
to establish it we need another key ingredient, namely poissonization. In poissonization one 
replaces by a Poisson n(A) distributed random variable N to derive asymptotics as A —> 00 . This 
turns out to be easier than the original problem owing to some nice properties of the Poisson process 
such as independence of the splitting processes. The transfer lemma used after poissonization is 
the following: 
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Lemma 6.7. For i € "S let fi : K.’*' —> K 6e some function that is bounded on (0,a] for all a > 0. 
Assume that there exist constants po,pi € (0,1) such that for all x > 0 and i € H 

Mx) = fi{xp^) + fi-i{x{l - Pi)) +ri^{x) ( 22 ) 

where rji : —>■ K. is some function. 

Then, as x ^ oo, rii{x) = 0(a:^““) for some a > 0 and both i G E implies 

fi{x) = 0{x), i G E. 

Proof. Iterating (12^ . by induction on n we find that for a sufficiently large constant C > 0 and 
all n G N 

I _ log X I 
L log PV J 

lA(x)l<Cx x G [i,Pv"']> * e {0. i}> 

j=0 

where pv := niax{po)Pi) 1 — PO) 1 ~Pi}- The assertion follows since the sum converges as a; —>■ oo. 
Details on the induction are left to the reader. □ 


The crucial part after poissonization is to transfer the asymptotics as A —> oo into asymptotics 
of the original problem. One way of doing this is the next lemma: 

Lemma 6.8. Let {a{n))neno ® xeal valued sequence. Moreover, let N\ be Poisson distributed 
with mean A > 0. Then, as n ^ oo, Aa{n) := a(n + 1) — a(n) = 0{y/n) implies 


Proof. First note that Aa(n) 
all n, m G No 

|a(n) - 


|a(n) - E[a(iV„)]| = O (n). 

= 0{y/n) implies that there exists a constant C > 0 such that for 

mVn —1 


i(to)| = 




< m\n — m\. 


Hence, we have that 

|a(n) - E[a(lV„)]| < E[|a(n) - a(7V„)|] < CE[v'n +- n|] 


which implies the assertion by the Cauchy-Schwarz inequality. □ 

This finally leads to the following bounds on Var(Z°) and Var(Z^) which we present next. 
Lemma 6.9. We have for both i G T,, as n ^ oo 

Var(Z;) = 0(n). 

Proof. Let V|(n) := Var(Z)j) and ■— First note that similar arguments to the ones 

given in the proof of lemma 16.41 reveal that 

Vkin) = E[CJ(/;)] + E[Ci(n - PJ] + Var(4(/;) + 4(n - PJ). (23) 

Since and v\ are Lipschitz-continuous, we have Nex(y\{I)f) F ~ ^n)) = which can 
be proven by a self centering argument similar to the one at the end of the proof of lemma 16.41 
Thus, lemma HIT] yields the rough upper bound 

Yax{Zlf) = O(nlogn). (24) 


In order to refine this bound, let Nx be a Poisson distributed random variable with mean A > 0 
which is independent of : n > 0,i G {0,1}}. Then, (ITSl) implies for both * G E 


ryi /7I 

^ 


N- 


^PiO 




M 


^Pil 


I ^,2 


(25) 
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where N\p^^j := and M\p^^ := N\ — It is a well known fact, e.g. from Poisson processes, 
that N\p.g and are independent and Poisson distributed with means Xpio and Xpn- 

Note that V^in) = O(nlogn) and the Lipschitz continuity of imply that, as A —>■ oo 

Var(Z^J = E[PJ(iVA)] + Var(4(iVA)) = O(AlogA), i S E, (26) 

where E[A^a log(A^A)] = O(AlogA) is not hard to check (details are given in the appendix, lemma 
EH). Moreover, Lemma [6.51 implies. as A —>■ oo 

Var(, 7 j^^J = O (E[(log(iVA + 1))') =0(71), (27) 

where the second bound holds since (log(n + 1))^ = 0{^/n) and E[-v/]Va] = 0(-\/A) as A —?> oo 
(details are given in the appendix, lemma \TM . Hence, (l25|) implies for 14(A) := Var(Z^^) 

14(A) = Var(Z^^^_^ + 

= ) + O (a^/^ V^) 

= ^o(Ap,o) + 14 (Aki) + O (A3/4y1^) . (28) 

in which the second equality holds by (IMl), (EH) and Lemma 16.21 and the last equality holds since 
and ^Mxp independent (which is one of the reason for poissonization). 

Lemma [6.71 yields the refined upper bound 

14(A) = 0(A). (29) 

Finally, we need to deduce asymptotic results for V|(n) out of (1^ . Since we have for both * S E 

Var(Z^J = E[V1(7 Va)] + Var(;z|(iVA)) 

and, by the Lipschitz continuity of that Ya.i{h'^^{N\)) = 0(A), we may conclude that, as A —?> oo 

mUNx)] = 0(A). (30) 

In order to apply Lemma [6.81 we need to check that 

AV^{n) = O(v^) (31) 

which may be done by the transfer theorem 14.31 First note that (l23l) and Lemma [5.31 imply for 
the differences 

APl(n) = p,oE[AV°{PJ] + (1 - p.o)IE[APi(n - /;)] + £,(n), 
where Si is given by 

ei(n) = Var(4(/;+i) + 4(n + 1 - I^+i)) - Var(4(/;) + 4(n - PJ). 

The Lipschitz-continuity of yields VaT{i'z{In) + ~ ^n)) = 0(n). Moreover, 

Var(4(/;+i) + 4(n + 1 - J^+i)) = Var (4(/;) + 4(n - J^) + BAv%{P^) + (1 - B)Av\{n - PJ) 

where B is independent of and Bernoulli distributed with parameter piQ. Since Ai/^ and Av^ 
are bounded, we may conclude by lemma that 

Var(i^|(/;+i) + + 1 - Z^+i)) = Var(4(/;) + 4(n - P„)) + O(v^) 

which implies £i(n) = O(aAi) and therefore, AP^(n) = 0{x/n) by lemma EH Hence, the depois- 
sonization lemma EH is applicable and the assertion follows. □ 
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We finish the section with the proof of theorem 16.11 
Proof of theorem \6.1[ Recall that for n G Nq, * S S we have 


Pi '-pioSo+PiiSi, := B^'. 

Moreover, we define for n G Nq and i G E 

R.(n) :=Var(i?;), := E[i?;]. 

We start with the proof for the special cases p = pt, i G T,. In these cases we have by definition 
of that 


Vi{n) = Var(X^ + = cr^n log n + O (ji\/log n'j . (32) 

where the last equality holds by Lemma 16.4116.91 and 16.21 

In order to obtain the result for arbitrary initial distributions p recall that, by a, 

^ 

where is binomial B{n,p{0)) distributed. 

Hence, we have by the independence of (i?°)„>o, (i?^)„>o and {K!f:)n>o 

Var(i?(() = E[VoiK^)] + E[V^{n - Kfi)] + YariMK) + ^i(n - K^)) 

= a^E[Kti \ogK + in- K^) log(n - K)))] + YaviMKH) + - K^)) 

+ O ^n\/log 

where the second equality holds by (1321) . Therefore, it only remains to show that 

lEK log K!f + {n- Klf) log(n - K^f)] = n log n + O (n^logri^ , (33) 

Var(i/o(R:'() + i/i(n - K^)) = O (rnjlogn^ . (34) 

For (155)1 note that x !->■ a;loga;+(l —a:) log(l —a:) is bounded on [0,1] (with OlogO := 0). Therefore, 
we have 


E[K^ log K^ + {n- K^) log(n - Kf^)] - n log n 
=nE[iF((/nlog(iG)(/n) + (1 - KH/n) log(l - K/n)] 

=0{n) 

which implies (1331) . Note that by Proposition 15.21 we have for z G E and n G No 

= ^nlogn + fi{n) 

where /o and fi are Lipschitz continuous functions. Since the Lipschitz continuity implies Vai(fo(K^) + 
fi{n — K!f)) = 0(n), it only remains to show that 

Var(R:(( \ogK!fl + (n - K!ff) log(n - Rif;)) = 0(n), 

which is an easy computation and essentially covered by the proof of lemma 15^ Thus, we leave 
the details to the reader. □ 
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6.1 Existence of the Splitting 

In the analysis of the variance we work with pairs z G S, that satisfy the initial 

conditions 


X; = Z; = 0, zGS,n<l, 


(35) 


as well as the stochastic recurrences 





n > 2, z G S 


(36) 


where (Xg,..., X°, Z^,..., Z°), (X.i 


0i ■ ■ • ; 


X^, Zq,. .., Z^) and are independent, = n — 


and r]\i^ is some constant satisfying 

= 0, n < 1 and rjh^ = 0(n) (zz —>• oo). 

We now discuss how to get (X^, Z* )„gNg_igi; with finite second moment that satisfy (1^ and (IMll 
as well as 


EK + K1 = EK] and Var(X; + Z^,) = Var(B^J, n G No, z G E. (37) 

By iterating (l36l) on the right hand side one expects 

( ?) 1 , 

Vn + Z2k=l = ^j/(n) 

i 2 I 

Vn + J2k=l Z//: = (ii,...,ifc)G{0,l}'“ ^Jl{n) 

where J/ (n) is some iteration of binomial distributed random variables that is generated as follows: 
For n G No and z G E let Ii{n) := where (L*)jgN is a sequence of independent Bernoulli 

B{pio) distributed random variables. Moreover, for each fc > 1, z G E and / G {0,1}^ let 
(//Q(rz),//]^(rz))„>o be an independent copy of {Ii{n),n — Ii{n))n>o- Then we define for both 
z g’ E 

j[°\n) ■-Ii{n), = n-Ii{n), 

and, for fc > 2 and I = (zi,..., z^) G {0,1}^ 

jRn) ( 4 *- "''-^(zz)). 

In the context of radix sort J/ (rz) may be interpreted as the number of strings with prefix / among 
n i.i.d. strings generated by a Markov source. 

Now let Ti(n) := min{fc > 1 : J- (zz) < 1 for all I G {0,1}^}. Since r]li^ = 77 )^’^ = 0 for zz < I and 
z G {0,1}, note that all summands for k > Ti equal zero in (1381) . Hence, if we have Ti(n) < 00 then 
the sum in (IMl) is finite. 

We will now discuss that for every zz G N we have Ti{n) < 00 almost surely and then use (I38|) 
to define (X^, .Z)j) and finally check that (I5S1) and (1571) holds. To this end note that 

Ml(n) :=max{j/(zz) : I G {0,1}'=} 

is bounded by zz, non-increasing in k and for Mk{n) > 2 the probability that Mk{n) decreases by 
at least one (i.e. Mk+i{n) < Mk{n) — 1) is at least (2p(I — p))"''^, p := maxjpy lz, j G E}, which 
can be seen as follows: At each step k there are at most rz/2 indices /i,... ,/„/2 G (0,1}'= with 
t/’’ (zz) > 2 since we have 

j/(zz) = rz. 

/e{o,i}'= 
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For each of these indices Ij = the probability that the next binomial splitter de- 

creases max{ Jl , Jl } by at least one is at least 2p(l — p) since starting the under¬ 

lying Bernoulli chain of (/^ (m))m>o with 01 or 10 causes a decrease. By the independence of 

Jm))jn> 0 : ■ ■ ■ {I .^/2 A'm))m >0 we obtain the upper bound (2p(l - 

~ If, ,0 — 

This yields that Ti{n) is stochastically dominated by a negative binomial nB{n, (2p(l — 
distributed random variable. In particular, we have for all n G N 


]E[ri(n)] < 


(2p(l — p))"/2 


< oo and Var(Ti) < oo. 


This implies that mean and variance of and defined by (IMll are finite since we have 
l^n^l < Cn for some constant C > 0 which together with J2i£{o i}*" yields 

]E[|X*|] < \ri\i^\+CiM[\Ti{n)\\ < oo, Var(X;) < E[(|77;’i| + Cnn{n)f\ < oo 
and similar bounds for since ? 7 ^’^ = 0(n). 

Hence, it only remains to show that the definition (1551) implies (1551) and (1571) . But (1551) holds 
by construction and is not hard to check. For (1571) note that (1551) and (1551) implies for the sum 
:= X* -I- in the case d = 0 that for both z S E 


-S'; = 0, n < 1 


and 


Ql qO 


+ s;_ji + n 


which uniquely defines all moments of 5"; that are finite. Since i?; satisfies the same conditions 
we obtain 


E[Sl] = E[H;] and Var(5;) = Var(H;). 


7 Asymptotic Normality 

Our main result is the asymptotic normality of the number of bucket operations: 

Theorem 7.1. For the number B!^ of bucket operations under the Markov source model with 
conditions m we have 


B^-E[Bii] d, 
\/ n log n 


A/’(0, tr^), (n —>■ oo). 


(39) 


where > 0 is independent of the initial distribution p and given by (W - 

As in the analysis of the mean, we first derive limit laws for B^ and i?; and then transfer these 
to a limit law for B!f via O- We abbreviate for i G E and n G Nq 


i/i(n) := E[H;], cr,(n) := \/Var(B;). 


Note that we have r'i(O) = r'i(l) = cri(O) = cri(l) = 0 and ai{n) > 0 for all n > 2. We define the 
standardized variables by 


W := 


i?; - nK 

ai{n) 


i G E, n > 2, 


(40) 


and FJ := Yf := 0. 

Our proof if based on an application of the contraction method to the recursive distributional 
system da-®. The Zolotarev metric used here has been studied in the context of the contraction 
method systematically in m- We only need the following properties, see Zolotarev [38l[59]: For 
distributions F-{X), £{Y) on R the Zolotarev distance (s, s > 0, is defined by 
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( 41 ) 


C.(X,y) := CmX),CiY)) := sup \E[f{X) - f{Y)]\ 
where s = m + a with 0<a< l,mSNo, and 

Xs := {/ G : \\f^^\x) - /W(y)|| < ||a: - y\n, 


(42) 


the space of m times continuously differentiable functions from R to R such that the m-th derivative 
is Holder continuous of order a with Holder-constant 1. We have that Cs {X, H) < oo, if all moments 
of orders 1,..., m of X and Y are equal and if the s-th absolute moments of X and Y are finite. 
Since later on only the case 2 < s < 3 is used, for finiteness of Cs(X, H) it is thus sufficient for 
these s that mean and variance of X and Y coincide and both have a finite absolute moment of 
order s. 

Properties of C,g-. (1) Convergence in C,g implies weak convergence on R. 

(2) is (s, -f) ideal, i.e., we have 

UX + Z,Y Y Z) < C,g (X, r), C. (cX, cY) = >^) 


for all Z being independent of (X, Y) and all c > 0. 

We will use an upper bound of Cs by the minimal Lp metric £p. For distributions C{X), C{Y) 
on R and p > 0 we have 

ipiX,Y) := ip{C{X),£{Y)) := M {\\X' -Y'Wp : X' ^ X,Y' = y] , 

where ||X||p := (E||X||p)(^/p)^^ denotes the Lp norm. We have ip{X,Y) < oo if ||X||p, ||X||p < oo. 
The bound used later for 2 < s < 3 is, see Lemma 5.7 in [8], 

Cs(^,i") < ((E||X||*)1-1/* + (E||r||«)i-1/*) 4(X,X), (43) 

for all X and Y with joint mean and variance and finite absolute moments of order s. 
Proposition 7.2. For both sequences {Y^)n>o, i G Y, we have for all 2 < s < 3 

C.(X^,Ar(0,l)) ^0, (n^oo). (44) 

Proof. ^From the recurrences o and the normalization (1401) we obtain for i G Y 


yi £ M^u) 0 I 

" tTi(n) cri(n) 


bi{n), 


n > 2, 


(45) 


where 

hin) = (n -f + i'i{n- PJ - ^/^(n)) , 

cTi (n) 

and in (HSl) we have that (Xg°,..., X^), (Xg^,..., Y^) and (7°, /^) are independent. 

For independent normal A/’(0,1) distributed random variables A/o,7\/i also independent of 
{In, In) we define 


Q 


i 

n 


MPn) 

ai{n) 


A/q -I- 


<^i(.n-PJ 

(Ti{n) 


Ml + bi{n), 


n >2. 


(46) 


Note that we have E[(5(j] = 0 and Yar{Ql^) = 1 for all n > 2. For the variance, this is seen by 
conditioning on 7* in (HSl) and (IT51) and using that Yf and A/) have the same variance 1 for all 
j > 2 and that for j G {0,1} the coefficients cro(j)/o'i(n) are zero, whereas for j G {n — l,n} the 
coefficients ai{n — j)/ai{n) are zero. Hence, the Zolotarev distances (g{Yf,Ql^), fg{Ql^,Mi) and 
fs{Yf^,Mi) are finite for all n > 2 and i G Y, where we have 2 < s < 3. 
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We denote by Af another normal Af{0, 1) distributed random variable. Then we have 


UY:,Af) < UYIQI.) + UQn,J^)- 

In the first step we show that (siQh,Af) —>■ 0 as n —>• oo for both i G T,. Note that ||(5^||s is 
uniformly bounded in n > 2 and i G T,. Hence, by (H51) there exists a constant C > 0 such that 
CsiQh,Af) < Cis{Qn,A/). Thus, it is sufficient to show is{Qn,Af) 0. With 

Af = \/pioAfQ + \/l — PioAf L 


we obtain 




V 



/ gi(n-/;) 
V cFiin) 



ATi 




(47) 


For the first summand in (|T7|) we have, by the strong law of large numbers and the variance 
expansion (fT^ that ao{I^)/(7i{n) —>• almost surely. Since Afo is independent from J* and 

||A/o||s < c» we obtain from dominated convergence that this first summand tends to zero. By 
similar arguments we also have that the second summand in (1471) tends to zero. The third summand 
||6i(n)||s is bounded as follows: With the notation (IT^ and h(x) = x\ogx as in Lemma FO of the 
Appendix, we have 


bM) = [KYn/n) - E[/i(/;/n)] + h{{n - V^ln) - E[h{{n - PJ/n)]} 

+ MPJ - E[/o(/;)] + /i(n - /;) - E[/i(n - /;)]) 

With cri(n) = H(-v/nlogn) and ()59p the contributions of all summands involving the function h 
are 0(l/\/logn) in the L^-norm, hence we have 

||6.(n)|U < WfoiPJ - E[/o(/;)]|U + Whin - PJ - E[/i(^ - PuMs 
+ o(i/\/iog n), (n —>■ oo). 

Furthermore, to bound ||/o(4^) — E[/o(/^)]||s we use an independent copy of I*. Then, by 
Jensen’s inequality for conditional expectations and the Lipschitz property of fi in Proposition 
15.21 (with Lipschitz constant bounded by C) 

||/o(/;) -E[/o(/;)]|U = ||E[/o(/;) - MK)\pj\u 
< \\fo(.Pn)-fo{K)\U 

<c\\p^-wju 

<2C'||/;-EK]|u 

= O(-yn). (48) 

Since ||/i(n — /^) — E[/i(n — /^)]l|s is bounded analogously and ai{n) = H(-v/nlogn) we obtain 
altogether as n —>■ oo and for both i G E. 

This completes the estimate for the first step (s{Qh,Af) —0 as n —>■ oo. 

Now, we denote the distances di{n) := (s{Yn,Af), for n > 2, and di{0) := di{l) := 0 for * S E. 
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Conditioning on and using that (s is (s, +) ideal we obtain for all n > 2 
di{n) 

<UynM+o{l) 

^ , Mn - PJ w ^ Min) ,r , Mn-Pn),r , w 

= Cs — rrM* + -— Yn-i’ + irM), —+- -r-\—M + h{n) + o i 

V cr^n) " cri(n) " (Ji{n) ai(n) J 


3=0 


-J2\^jjPioi^~PM ^Cs 


^j4y°++ i{/>=j}&.(n), 

o-j(n) ■' tTi(n) W j/ 

Mj)_^^ + + 0(1) 


3=2 




ai(n) 

CTo(j) 


= E 


ai{n) 


doiPn 


(T^{n) 

Mn- In) 

ai{n) 


a^{n) 

C(Y°Afo) + 


di(n-M 


^i(n-j) 
0-1 (n) 

+ 0 ( 1 ). 


+0(1) 


With d(n) := do(n) V di(n) we obtain for both i € S that 

Vi(n- /;) 


di(n) < E 


L{l</j<ra-l} 


f Mln) 

V o-*(n) 


a^(n) 


sup d(j) 

l<j<n—1 


(49) 


(50) 


+ M - pM + P?o)d(n) + o(l). 


With 


f(n) := maxE 

iGE 


4l</i<ra-l} 


/ ^o(In) 
V crj(n) 


£:(n) := max{(1 - p,o)” +Pro} 


V J j_ 


we obtain by taking the maximum of the right hand sides in (1501) 

C(n) 


d(n) < 


sup d(j)+o(l). 


We have s(n) —>■ 0 and, since s > 2 and pu G (0,1) for both i € S, 

^ := lim 5(n) = maxjp^/^ + (1 < 1- 

n=fOO zGE L j 


(51) 


(52) 


With (ICTl) this implies that (d(n))„>i remains bounded. We denote g := sup„>Q d(n) and t] := 
limsup„_,,oo d{n). Hence, we have g,r] < oo and for any e > 0 there exists an no > 2 such that for 
all n > no we have d{n) < rj + e. From we obtain with (15^ for both i G S 


di(ji) < E 


\^i<’^o}U{/^>n—no} 


-E 


•-{no</j<n-r!,o} 


CTi(n) 

( Min) 

V o-i{n) 


+ 


< (? + o(l ))(?7 + £) + o(l) 


( Mn-IM 
V cri(n) , 
Mn-Pj V 
CTi(n) J 


> Q 

(?7 + £) + o(l) 


(53) 

(54) 

(55) 


with appropriate o(l) terms. Maximizing over i G S this yields d(n) < o(l) + (^ + o(l))(77 + £) 
and with n —>■ oo 


V< ^iv + s)- 

Since £ > 0 can be chosen arbitrarily small we obtain g = 0, i.e. Cs(F^,A/’) —^ 0 as n —>• oo for 
both z G E. □ 
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Proof of Theorem [m We write 


Ba - E[Bff] + ^ + Mn - K,,) + n-E[B^] 

\/ n log n y/ n log n n log n 

By the Lemma of Slutzky it is sufficient to show, as n —>■ c», 


“ '^o{Kn) + -vi{n- Kn) 

y/nlogn 

i^oiKn) -P vi(n - Kn) +rt-E[5^] 
logn 


N{0,a^) 

0 . 


(56) 

(57) 


For showing (l56)) note that by Proposition 17.21 (B\ — E[i3)j])/-v/nlogn —>■ ^(0, cr^) in distribution 
for both i G S. We set An := [^J^'on — ^qu + fl No and := {0,..., n} \ An- Then by 
Chernoff’s bound (or the central limit theorem) we have P(isr„ G A„) —>■ 1. For all a; G K we have 


P 



t^ojKn) + -^ra-JCn 
y/ n log n 


vi{n 



= 0(1) + ^ P(iF„ = j)P ( 


Bi_j -i^ijn-j) 
y/n log n 


< X 


For j G An we have y/jlogj /y/n log n -)> y/JI^ and y/(n- j) log(ri. - j)/y/ n log n -» y/l - ^q. 
Hence, we have {B° — voif))/y/nlogn —^ ^(0, ^o^^) and {Bl^_j — i'i{n — j))/y/nlogn —>■ ^(0, (1 — 
/io)cr^) in distribution and the two summands are independent. Together, denoting by Nq ^^2 an 
Af{0, cr^) distributed random variable we obtain 

< a:] = o(l) + ^ P(iF„ = j)(P {No ,,2 <x)+ o(l)) 

jeArt 

—>■ P {Nq ,„2 < x) , 

where the latter convergence is justified by dominated convergence. This shows (j56p . 

For dSZl) note that (HI) implies 


B%^-MKn)+Bl_^^- 
y/ n log n 


= E[r'o(-f^n)] +E[^i(n - iF„)] + n. 

Hence, with the notation (IT^ and h{x) = a;logx, x G [0,1], we have 

- WMKu) + Mn-Kn) + n- E[H((]||3 

y/n log n 

= . \\no{Kn) - E[r'o(iF„)] + ni{n - Kn) - E[j^i(n - ifr!,)]||3 

y/n log n 

< ^ WHKn) - E[h{Kn)] + h(n - Kn) - E[h(n - Kn)]h 

H y/n log n 

+ - ll/o(j^n) - E[/o(iF„)]||3 + ^^||/l(n - Kn) - E[/i(n - Kn)\h 

y/n log n y/n log n 

An easy calculation reveals (details are given in the appendix, equation (1591) 1 


\\h{Kn) - E[h{Kn)] + h{n - Kn) - E[h{n - Kn)]h 

(^)] 

= O . 
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The terms \\fo{Kn) - E[fo{Kn )]\\3 and \\fi{n-Kn)-E[fi{n-Kn )]\\3 are also of the order 0(n^/^) 
by the argument used in (H51) . Altogether we have 


^/rl\o^ 


1 



which implies (1571) . 


□ 
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Appendix 


Asymptotics of the Binomial and Poisson distribution 


The appendix is meant to cover some elementary asymptotic moment calculations of the binomial 
and Poisson distribution. These calculations were made for the sake of completeness and may be 
removed in the published version of this paper. 

The following approximations are immediate consequences of the concentration of the binomial 
distribution. Recall x log cc = 0 for a; = 0. 


Lemma 7.3. Let p G (0,1), h(x) := x log x for x G [0,1] 
for n G N. Then we have as n ^ oo 


E 



V n+1 ) 


\ogp 


\\h{Xn,p/n) - E [h{Xn^p/n)\\\^ 
E[/i(X„,p/n) - h{p)] 
Proof. Proof of (l58l) : Note that we have for all e G (0,1) 

I log(x) - log(y)| < e" V - y\, 


and Xn^p be binomial B{n,p) distributed 


= 0(n-V=), 

(58) 

= 0 (n-V^) , 

(59) 

= 0(n-2/3). 

(60) 

by the mean value theorem 


x,y G [e, 1]. 



This yields 


log 


E 

< E 


< -E 
P 


X. 


n,p 


n + 1 


- logp 


lo"! lorn 


V n + 1 / 

^{^n,p>np/2} 


Xn,p + l- np-p 


n+1 


O (lognP(X„,p < np/2)) 
O (lognP(X„_p < np/2)). 


The assertion follows since E[|(X„^p — np)/yjnp(l — p)|] converges to the first absolut moment of 
the standard normal distribution and lognP(X„_p < np/2) = o(n“^/^) by Chernoff’s bound. 


Proof of (l59)l : First note that h is bounded on [0,1] and that we have for all e G (0,1) 

|/i'(a:)| < log(l/e) + 1, xG[e, !]• 

In particular, we obtain by the mean value theorem that 

|/i(a;) -/i( 2 /)| < (log(l/e) + l)|x - j/l, x,yG[e,l]. (61) 

With an independent copy Xn,p of Xn^p we obtain by Jensen’s inequality and m 
llh(X„^p/n) - E [h{Xn,p/n)]\\\ 

= E[(E[/i(X„,p/n) - h{Xn,p/n)\Xn.p]f] 

< E[(/i(X„.p/n) - /r(A„,p/n))3] 

= E[(/l(X„_p/n) — h{Xn,p/n)) l{x„,p,J?„,pG[rap/2,n]}] C)(P(-^n,p < 

< (log(2/p) + l)^E[(X„,p/n - X„^p/n)^] + 0(P(X„,p < np/2)) 

< (2||X„,p/VH||3)" + 0(P(X„,p<np/2)). 

The assertion follows by Chernoff’s bound on P(X„^p < np/2) and ||X„_p/-y/n ||3 —>■ ||iV ||3 where N 
is A/’(0,p(l — p)) distributed. 


Proof of (l60l) : It is sufficient to show that 
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1. /i(p)-pE[log(X„,p/n)l{x„,p>i}] = 0(n 2/3)^ 

2. E[/i(X„_p/n) -plog(X„,p/n)l{x„,p>i}] = 0(n"2/3). 
For the first part note that we have 


Hp) - -PE[log(X„,p/n)l{x„,p>i}] 


= P 

= P 
< P 


log 




n,p 


np 




o((i-pr) 


log 


^ Xn,p -np \ _ X„^p -np \ ^ 


np 


np 


4^n,p>l} 


+ o((i-pn 


log 1 + 


Xn,p - np\ Xn,p - np 


np 


np 


^{\Xn,p — np\<n‘^/^} 


+ (log(np) + l/p)P(|X„,p - np\ > + O ((1 - p)”). 

Since we have log(l + x) — x = O(x^) for x —>■ 0 and P(|X„_p — np\ > n^/^) = o{n~^) by Chernoff’s 
bound, we may conclude that 

Hp) --PE[log(-’^n.p/n)l{x„,p>i}] = 

In order to obtain the second bound, note that 

E[h{Xn,p/n) - plog(X„,p/n)l{x„,p>i}] 

= E [{HXn,p/n) - plog{Xn^p/n)) l{x„,p>i}] + 0((1 -p)”) 




Xji^p np 
\/n 

np 

y/n 

Xn^p np 




log 


log ' 


4^n,p>l} 


+ 0((l-pn 


^{\Xn,p — np\<n^/^} 


(n-2/3) 


i/n 


E 


10 g(p)l{|x„ p_„p|<„2/3} 

V 

^n,p - np Xn,p - np \ ^ 

i/n \ np 


^{\Xn,p — np\<n'^/^} 




Since log(l + x) = 0(a:) as a: —>■ 0 and E[|(X„^p — np)/y/H\ converges to the first absolute moment 
of the A/'(0,p(l — p)) distribution, we obtain for the second summand 


y/n 


E 


Xn,p np 


log 1 + 


Xn,p np 


np 


^ {\^n,p — np\<n^^^} 


= 0(n-5/®). 


For the first summand note that E[(X„_p — np)/Hn] = 0 which implies 


^E 

\/n 


^n,p np / \-i 

--^=- iOg(pji{|X„ p-„p|<„ 2 / 3 } 


= -J 
yjn 


Xn,p np. 


-log(p)l{|x„ p-„p|>„2/3} 

= 0(P(|X„,p - np| > n2/3)) 


Hence, we obtain E[/i(X„^p/n) — plog(X„_p/n)l{x„,p>i}] = 0(n 2/3) which combined with the 
first result yields the assertion. □ 

The next Lemma provides asymptotic results for the poisson distribution that are needed for 
the analysis of the variance: 
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Lemma 7.4. For X > 0 let N\ be Poisson(X) distributed. Then we have for all a, /3 > 0 as X ^ co 


E[iV^“] = 0(A“), 

E [iV^“(logiVA)^] = O (A“(logA)'5) . 

Proof. We start with the analysis of E[A^“]: For a G N the assertion follows by induction and the 
fact that for every n G No we have 


E 


- *) 

J=0 


= A”+^ 


For a G (0,1) note that a; i—is concave on [0, oo) and therefore, by Jensen’s inequality 

E[iV^] < (E[iVA])“ = A“. 

Finally, for a G (1, oo) fl N'’ we have that x i—is concave on [0, oo) which yields 

E[N^] < (E[iv]“^])“/r«l 


and the assertion follows by the results for a G N. 


For the second part of the proof we use the following decomposition 

E [iV“(logiV;,)^] = E + +E [iV“(logiV;,)^l{^,>;,. + l}] 

< (a + l)'5(logA)%[fV^“] +E [iV^“(logfVA)''l{Ar,>A<^+i}] 

= 0(A“(logA)^) +E [fV«(logiVA)^l{iv.>A<>+n] ’ 

where the last step holds since E[fV“] = 0(A“). Hence, it is sufficient to show that 

E [lV^“(loglVA)^l{iv.>A<^+i}] = 0(A“). 


Since we have n“(logn)^ < for a sufficiently large constant Ca 0 and all n G Nq, we 

obtain 


Ar3o;/2- 


E [lV^“(logiVA)^ l{Af>>A“+i}] < C'a/?® 


where the last inequality holds by the Cauchy-Schwarz inequality. Together with the previous 
result E[fV|“] = 0(A^“) and Markov’s inequality this yields 

E[iV^“(loglVA)''l{iv.>A<^+i}] =0(A“) 

and the assertion follows. □ 
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